618 results found.
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Dutch English French German Italian Portuguese Romanian Spanish
Availability:
Freely Available
License:
Creative Commons Attribution-NonCommercial-NoDerivs 4.0 License
Size:
None Production Status:
Newly created-finished
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:MuST-Cinema: a Speech-to-Subtitles corpus
-
Paper track:Multimodality/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Alina Karakanta | MuSt-Cinema | /N |
Documentation:
Documentation publicly available in English
Written
Corpus,
Language Type:
Monolingual
Languages:
German
Availability:
Freely Available
License:
Creative Commons Zero copyright waiver (CC0)
Size:
298352 tokens Production Status:
Newly created-in progress
Use:
Anaphora, Coreference
-
Paper title:GerDraCor-Coref: A Coreference Corpus for Dramatic Texts in German
-
Paper track:Written/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Janis Pagel | GerDraCor-Coref | /N |
Documentation:
https://github.com/quadrama/gerdracor-coref/blob/gold/README.md
Written
Corpus,
Language Type:
Monolingual
Languages:
German
Availability:
Freely Available
License:
CreativeCommons
Size:
304,286 words Production Status:
Newly created-finished
Use:
Machine Learning
-
Paper title:How Much Data Do You Need? About the Creation of a Ground Truth for Black Letter and the Effectiveness of Neural OCR
-
Paper track:Evaluation/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Phillip Benjamin Ströbel | NZZ Black Letter Ground Truth | /N |
Documentation:
There is a documentation on the GitHub page in English, which is freely available.
Written
Corpus,
Language Type:
Monolingual
Languages:
German
Availability:
Freely Available
License:
MIT
Size:
5355043 entries Production Status:
Newly created and combined with existing
Use:
Document Classification, Text categorisation
-
Paper title:Training a Broad-Coverage German Sentiment Classification Model for Dialog Systems
-
Paper track:Written/poster presentation with demo
-
Paper status:Accept Poster+Demo
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Oliver Guhr | Broad-Coverage German Sentiment Classification for Dialog Systems | /N |
Documentation:
None
Written
Corpus,
Language Type:
Bilingual
Languages:
English German
Availability:
Freely Available
License:
Creative Commons
Size:
1627 Summaries OtherProduction Status:
Newly created-finished
Use:
Summarisation
-
Paper title:Summarization Beyond News: The Automatically Acquired Fandom Corpora
-
Paper track:Evaluation/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Benjamin Hättasch | FandomCorpora | /N |
Documentation:
Readme on page/in the repository
Written
Corpus,
Language Type:
Multilingual
Languages:
English German Spanish
Availability:
Freely Available
License:
Size:
None Production Status:
Newly created-finished
Use:
Semantic Role Labeling
-
Paper title:WikiBank: Using Wikidata to Improve Multilingual Frame-Semantic Parsing
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Meriem Beloucif | WikiBank | /N |
Documentation:
None
Written
Corpus,
Language Type:
Multilingual
Languages:
English Farsi French German Japanese
Availability:
Freely Available
License:
Size:
4.5 MByte Production Status:
Existing-updated
Use:
Document Classification, Text categorisation
-
Paper title:Multi-class Multilingual Classification of Wikipedia Articles Using Extended Named Entity Tag Set
-
Paper track:Evaluation/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Hassan S. Shavarani | Shinra-5LDS Dataset | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Multilingual
Languages:
Czech English French German
Availability:
Freely Available
License:
Not know yet
Size:
2 hoursProduction Status:
Newly created-in progress
Use:
Language Identification
-
Paper title:Detecting English Speech in the Air Traffic Control Voice Communication
-
Paper track:14.7 Automatic Speech Recognition in Air Traffic M/Poster Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Igor Szoke | ATCO2 ATC dataset | /N |
Documentation:
Not yet
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Arabic English French German Greek Italian Portuguese Russian Spanish
Availability:
Freely Available
License:
CC BY-NC-ND 4.0
Size:
200 Production Status:
Newly created-finished
Use:
Corpus Creation/Annotation
-
Paper title:The Multilingual TEDx Corpus for Speech Recognition and Translation
-
Paper track:12.6 Speech and multimodal resources/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Elizabeth Salesky | Multilingual TEDx (mTEDx) | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
German
Availability:
From Owner
License:
Size:
35,000 audio recordings OtherProduction Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:Self-paced ensemble learning for speech and audio classification
-
Paper track:8.6 Neural network training methods (including new/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Radu Tudor Ionescu | Mask Augsburg Corpus | /N |
Documentation:
None




